Subtopic Segmentation of Scientific Texts: Parameter Optimisation
نویسندگان
چکیده
منابع مشابه
TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages
TextTiling is a technique for subdividing texts into multi-paragraph units that represent passages, or subtopics. The discourse cues for identifying major subtopic shifts are patterns of lexical co-occurrence and distribution. The algorithm is fully implemented and is shown to produce segmentation that corresponds well to human judgments of the subtopic boundaries of 12 texts. Multi-paragraph s...
متن کاملSubtopic annotation and automatic segmentation for news texts in Brazilian Portuguese
Subtopic segmentation aims to break documents into subtopical text passages, which develop a main topic in a text. Being capable of automatically detecting subtopics is very useful for several Natural Language Processing applications. For instance, in automatic summarisation, having the subtopics at hand enables the production of summaries with good subtopic coverage. Given the usefulness of su...
متن کاملSubtopic Annotation in a Corpus of News Texts: Steps Towards Automatic Subtopic Segmentation
Subtopic segmentation aims at finding the boundaries among text passages that represent different subtopics, which usually develop a main topic in a text. Being capable of automatically detecting subtopics is very useful for several Natural Language Processing applications. This paper describes subtopic annotation in a corpus of news texts written in Brazilian Portuguese. In particular, we focu...
متن کاملMulti-Paragraph Segmentation of Expository Text
This paper describes TextTiling, an algorithm for partitioning expository texts into coherent multi-paragraph discourse units which reeect the subtopic structure of the texts. The algorithm uses domain-independent lexical frequency and distribution information to recognize the interactions of multiple simultaneous themes. Two fully-implemented versions of the algorithm are described and shown t...
متن کاملSEGMENTATION OF EXPOSITORY TEXT Marti
This paper describes TextTiling, an algorithm for partitioning expository texts into coherent multi-paragraph discourse units which re ect the subtopic structure of the texts. The algorithm uses domain-independent lexical frequency and distribution information to recognize the interactions of multiple simultaneous themes. Two fully-implemented versions of the algorithmare described and shown to...
متن کامل